Text Classification Using Symbolic Data Analysis
نویسنده
چکیده
In the real world, an operational text classification system is usually placed in the environment where the amount of human-annotated training documents is small in spite of thousands of classes. In this environment text classifier are probably the most appropriate methods for the practical systems rather than other complex learning models. Text classifiers are basically used for free flowing texts that are basically unstructured text documents and classification is done with a statistical feature weighting method which involves a pre-processinga method wherein texts are reduced by eliminating digits, punctuations, hyphens, stop words and high/low frequency words and by applying stemming. This strategy of text classification cannot be applied to the domain of unstructured texts describing the advertisements, since these texts give the description in terms of attribute values. Since none of the text classifiers are useful in classifying such texts in an unstructured text document, the concept of symbolic data analysis is introduced. Symbolic Data Analysis (SDA) is a new domain in the area of knowledge discovery and data management, related to multivariate analysis, pattern recognition, databases and artificial intelligence. In this method of Symbolic Data Analysis for classification of unstructured text documents, uses a symbolic database and querying processes are proposed. From the proposed technique it seems that it is one of the efficient techniques to classify texts in unstructured text documents and hence is introduced for the better result when dealing with unstructured text documents.
منابع مشابه
Critical Assessment of Poetic Imagery Translation in Nizami’s “Leili & Majnun” by James Atkinson
Poetry translation involves cognition, discourse, and action by and between human s and textual a c- tors in physical and social settings. The aim of this study was to find out to what extent the non - native translator of Nizami Ganjavi’s “Leili and Majnun” could preserve the poetic imagery in its English translation. To this end, an innovative taxonomic model, which could investiga...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملSymbolic Representation and Classification of Logos
In this paper, a model for classification of logos based on symbolic representation of features is presented. The proposed model makes use of global features of logo images such as color, texture, and shape features for classification. The logo images are broadly classified into three different classes viz. logo image containing only text, an image with only symbol, and an image with both text ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014